If the matrix $A$ is large and sparse and/or if only some eigenvalues and their eigenvectors are desired, iterative methods are the methods of choice. For example, the power method can be useful to compute the eigenvalue with the largest modulus. The basic operation in the power method is matrix-vector multiplication, and this can be performed very fast if $A$ is sparse. Moreover, $A$ need not be stored in the computer -- the input for the algorithm can be just a function which, given some vector $x$, returns the product $Ax$.
An improved version of the power method, which efficiently computes some eigenvalues (either largest in modulus or near some target value $\mu$) and the corresponding eigenvectors, is the Lanczos method.
For more details, see I. Slapničar, Symmetric Matrix Eigenvalue Techniques and the references therein.
The reader should be familiar with concepts of eigenvalues and eigenvectors, related perturbation theory, and algorithms.
The reader should be able to recognise matrices which warrant use uf Lanczos method, to apply the method and to assess the accuracy of the solution.
$A$ is a real symmetric matrix of order $n$.
Given a nonzero vector $x$ and an index $k<n$, the Krylov matrix is defined as
$$ K_k=\begin{bmatrix} x & Ax & A^2 x &\cdots & A^{k-1}x \end{bmatrix}. $$Krilov subspace is the subspace spanned by the columns of $K_k$.
The Lanczos method is based on the following observation. If $K_k=XR$ is the $QR$ factorization of the matrix $K_k$, then the $k\times k$ matrix $T=X^T A X$ is tridiagonal. The matrices $X$ and $T$ can be computed by using only matrix-vector products in $O(kn)$ operations.
Let $T=Q\Lambda Q^T$ be the EVD of $T$. Then $\lambda_i$ approximate well some of the largest and smallest eigenvalues of $A$, and the columns of the matrix $U=XQ$ approximate the corresponding eigenvectors.
As $k$ increases, the largest (smallest) eigenvalues of the matrix $T_{1:k,1:k}$ converge towards some of the largest (smallest) eigenvalues of $A$ (due to the Cauchy interlace property). The algorithm can be redesigned to compute only largest or smallest eigenvalues. Also, by using shift and invert strategy, the method can be used to compute eigenvalues near some specified value. In order to obtain better approximations, $k$ should be greater than the number of required eigenvalues. On the other side, in order to obtain better accuracy and efficacy, $k$ should be as small as possible.
The last computed element, $\mu=T_{k+1,k}$, provides information about accuracy: \begin{align*} \|AU-U\Lambda\|_2&=\mu, \\ \|AU_{:,i}-\lambda_i U_{:,i}\|_2&=\mu |Q_{ki}|, \quad i=1,\ldots,k. \end{align*} Further, there are $k$ eigenvalues $\tilde\lambda_1,\ldots,\tilde\lambda_k$ of $A$ such that $|\lambda_i-\tilde\lambda_i|\leq \mu$, and for the corresponding eigenvectors, we have $$\sin2\Theta(U_{:,i},\tilde U_{:,i}) \leq \frac{2\mu}{\min_{j\neq i} |\lambda_i-\tilde \lambda_j|}.$$
In practical implementations, $\mu$ is usually used to determine the index $k$.
The Lanczos method has inherent
numerical instability in the floating-point arithmetic: since the Krylov vectors are, in fact,
generated by the power method, they converge towards an eigenvector of $A$.
Thus, as $k$ increases, the Krylov vectors become more and more parallel, and the recursion in the
function myLanczos()
becomes numerically unstable and the computed columns of $X$
cease to be sufficiently orthogonal. This affects both the convergence and
the accuracy of the algorithm. For example, several eigenvalues of $T$ may
converge towards a simple eigenvalue of $A$ (the, so
called, ghost eigenvalues).
The loss of orthogonality is dealt with by using the full
reorthogonalization procedure: in each step, the new ${\bf z}$ is orthogonalized against all
previous
columns of $X$, that is, in function myLanczos()
, the formula
z=z-Tr.dv[i]*X[:,i]-Tr.ev[i-1]*X[:,i-1]
is replaced by
z=z-sum(dot(z,Tr.dv[i])*X[:,i]-Tr.ev[i-1]*X[:,i-1]
To obtain better orthogonality, the latter formula is usually executed twice. The full reorthogonalization raises the operation count to $O(k^2n)$.
The selective reorthogonalization is the procedure in which the current $z$ is orthogonalized against some selected columns of $X$, in order to attain sufficient numerical stability and not increase the operation count too much. The details are very subtle and can be found in the references.
The Lanczos method is usually used for sparse matrices. Sparse matrix $A$ is stored in the sparse format in which only values and indices of nonzero elements are stored. The number of operations required to multiply some vector by $A$ is also proportional to the number of nonzero elements.
The function eigs()
implements Lanczos method real for symmetric matrices and more general Arnoldi method
for general matrices.
In [1]:
using LinearAlgebra
In [16]:
function myLanczos(A::Array{T}, x::Vector{T}, k::Int) where T
n=size(A,1)
X=Array{T}(undef,n,k)
dv=Array{T}(undef,k)
ev=Array{T}(undef,k-1)
X[:,1]=x/norm(x)
for i=1:k-1
z=A*X[:,i]
dv[i]=X[:,i]⋅z
# Three-term recursion
if i==1
z=z-dv[i]*X[:,i]
else
# z=z-dv[i]*X[:,i]-ev[i-1]*X[:,i-1]
# Full reorthogonalization - once or even twice
z=z-sum([(z⋅X[:,j])*X[:,j] for j=1:i])
# z=z-sum([(z⋅X[:,j])*X[:,j] for j=1:i])
end
μ=norm(z)
if μ==0
Tr=SymTridiagonal(dv[1:i-1],ev[1:i-2])
return eigvals(Tr), X[:,1:i-1]*eigvecs(Tr), X[:,1:i-1], μ
else
ev[i]=μ
X[:,i+1]=z/μ
end
end
# Last step
z=A*X[:,end]
dv[end]=X[:,end]⋅z
z=z-dv[end]*X[:,end]-ev[end]*X[:,end-1]
μ=norm(z)
Tr=SymTridiagonal(dv,ev)
eigvals(Tr), X*eigvecs(Tr), X, μ
end
Out[16]:
In [17]:
using Random
Random.seed!(421)
n=100
A=Matrix(Symmetric(rand(n,n)))
# Or: A = rand(5,5) |> t -> t + t'
x=rand(n)
k=10
Out[17]:
In [18]:
λ,U,X,μ=myLanczos(A,x,k)
Out[18]:
In [19]:
# Orthogonality
norm(X'*X-I)
Out[19]:
In [20]:
X'*A*X
Out[20]:
In [21]:
# Residual
norm(A*U-U*Diagonal(λ)), μ
Out[21]:
In [22]:
U'*A*U
Out[22]:
In [23]:
# Orthogonality
opnorm(U'*U-I)
Out[23]:
In [24]:
# Full eigenvalue decomposition
λeigen,Ueigen=eigen(A);
In [25]:
using Arpack
In [26]:
?eigs
Out[26]:
In [48]:
# Lanczos method implemented in Julia
λeigs,Ueigs=eigs(A; nev=k, which=:LM, ritzvec=true, v0=x)
Out[48]:
In [28]:
[λ λeigs λeigen[sortperm(abs.(λeigen),rev=true)[1:k]] ]
Out[28]:
We see that eigs()
computes k
eigenvalues with largest modulus. What eigenvalues did myLanczos()
compute?
In [29]:
for i=1:k
println(minimum(abs,λeigen.-λ[i]))
end
Conslusion is that the naive implementation of Lanczos is not enough. However, it is fine, when all eigenvalues are computed. Why?
In [30]:
λall,Uall,Xall,μall=myLanczos(A,x,100)
Out[30]:
In [31]:
# Residual and relative errors
norm(A*Uall-Uall*Diagonal(λall)), norm((λeigen-λall)./λeigen)
Out[31]:
In [32]:
methods(eigs);
We can use Lanczos method with operator which, given vector x
, returns the product A*x
. We use the function LinearMap()
from the package
LinearMaps.jl
In [33]:
# Need Pkg.add("LinearMaps"); Pkg.checkout("LinearMaps")
using LinearMaps
In [35]:
methods(LinearMap);
In [36]:
# Operator from the matrix
C=LinearMap(A)
Out[36]:
In [37]:
λC,UC=eigs(C; nev=k, which=:LM, ritzvec=true, v0=x)
λeigs-λC
Out[37]:
Here is an example of LinearMap()
with the function.
In [38]:
f(x)=A*x
Out[38]:
In [39]:
D=LinearMap(f,n,issymmetric=true)
Out[39]:
In [40]:
λD,UD=eigs(D, nev=k, which=:LM, ritzvec=true, v0=x)
λeigs-λD
Out[40]:
In [41]:
using SparseArrays
In [42]:
?sprand
Out[42]:
In [43]:
# Generate a sparse symmetric matrix
C=sprand(n,n,0.05) |> t -> t+t'
Out[43]:
In [44]:
issymmetric(C)
Out[44]:
In [49]:
λ,U=eigs(C; nev=k, which=:LM, ritzvec=true, v0=x)
λ
Out[49]:
In [ ]: